Skip to content

Conversation

@Nicolas-Iskos
Copy link

@Nicolas-Iskos Nicolas-Iskos commented Mar 10, 2022

Erase functionality is now supported for static_map via the use of an additional sentinel value. Erased slots can be reused during future insertions. Users can specify that they want erase functionality by providing an erased_key_sentinel during construction. Included below are some plots showing the performance of erase and demonstrating that neither insert nor find incur a performance regression as a result of adding erase support. For each plot, num_keys=1E8.
Insert throughput vs  load_factor
Search-all throughput vs  load_factor
Erase-all Throughput vs  Load factor (Gaussian 32-bit K_V pairs)
Erase-all Throughput vs  Load factor (Gaussian 64-bit K_V pairs)
Erase-none vs  Load Factor (unique 64-bit K_V pairs)
Erase-none vs  Load Factor (unique 32-bit K_V pairs)

@GPUtester
Copy link

Can one of the admins verify this patch?

@jrhemstad
Copy link
Collaborator

add to whitelist

@jrhemstad
Copy link
Collaborator

ok to test

@jrhemstad
Copy link
Collaborator

@Nicolas-Iskos could you include some of the performance graphs in the PR description?

Copy link
Collaborator

@jrhemstad jrhemstad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the summary documentation above the static_map class to include a discussion of erase? Include:

  • Limitations about concurrent insert/find/erase
  • Limitations for erase when a erased sentinel was not provided
    • Enforced for bulk API, but not device API

@PointKernel PointKernel added type: feature request New feature request topic: static_map Issue related to the static_map labels Mar 10, 2022
@PointKernel
Copy link
Member

First commit from auto formatter! 🎉

Copy link
Collaborator

@sleeepyjack sleeepyjack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing this feature!
I've stepped through the core functionality and compared it to WarpCore's SingleValueHashTable implementation. tl;dr: looks good. 👍

@PointKernel
Copy link
Member

We can eliminate the stride parameter by directly inferring the optimal grid_size for a given architecture:

Depending on the use case, one way is not always better than the other. If I remember correctly, detail::get_grid_size is really efficient when map occupancy is low. cudaOccupancyMaxActiveBlocksPerMultiprocessor tends to return a relatively small grid size so one thread will handle multiple input elements as opposed to one per thread in manual calculations.

Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nicolas-Iskos Thanks for your care and persistency being put into this PR! Looks great to me! Depending on how you see it, we can leave the benchmark refactoring to a separate PR.

@Nicolas-Iskos
Copy link
Author

@Nicolas-Iskos Thanks for your care and persistency being put into this PR! Looks great to me! Depending on how you see it, we can leave the benchmark refactoring to a separate PR.

Of course, and thank you guys for all the feedback! I'd be fine leaving the benchmark refactoring to a separate PR. I could even do that with the PR for dynamic_map::erase, where we could refactor the dynamic map benchmark as well.

@PointKernel
Copy link
Member

I could even do that with the PR for dynamic_map::erase, where we could refactor the dynamic map benchmark as well.

We can do this once NVIDIA/nvbench#80 is merged. zip_axes can simplify the code a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic: static_map Issue related to the static_map type: feature request New feature request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants